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Abstract 

Bayesian networks (BNs) are used for inference 
and sampling by exploiting conditional independence 
among random variables. Context specific indepen¬ 
dence (CSI) is a property of graphical models where 
additional independence relations arise in the context of 
particular values of random variables (RVs). Identifying 
and exploiting CSI properties can simplify inference. 
Some generative network models (models that gener¬ 
ate social/information network samples from a network 
distribution P{G)), with complex interactions among a 
set of RVs, can be represented with probabilistic graph¬ 
ical models, in particular with BNs. In the present work 
we show one such a case. We discuss how a mixed Kro- 
necker Product Graph Model can be represented as a 
BN, and study its BN properties that can be used for 
efficient sampling. Specifically, we show that instead 
of exhibiting CSI properties, the model has determinis¬ 
tic context-specific dependence (DCSD). Exploiting this 
property focuses the sampling method on a subset of the 
sampling space that improves efficiency. 


Introduction 


In the last few decades Bayesian networks (BNs) ( [Pearl 
19881 have grown from a theoretical approach to model joint 
distributions, to a powerful tool that can be applied to solve 
many real-world problems due to the relative ease of estima¬ 
tion and inference. Specifically, a BN is a directed acyclic 
graph where nodes represent random variables (RVs) and 
edges represent conditional dependence of variables in the 
direction specified in the graph. 

One of the most important characteristics of BNs is the 
relative ease of the inference process. For instance, the use 
of a specific context C = c over a set of variables (i.e. values 
assigned to them) can facilitate computation of the poste¬ 
rior probability o f the remaining variab les given the context 
{P{X\C = c)) (jBoutilier et al. 1996|. Even though it has 
been demonstrated that the exact inference problem is NP- 
hard for arbitrary BNs (Cooper 1990 1 , in some cases, the 
contextual structure can be used for tractable inference. 

In addition to inference, BNs can be utilized for sam¬ 
pling. The sampling process generally involves determining 
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a topological ordering of the variables (i.e., Xi ,..., A„), 
then iteratively drawing the value for each RV given the pre¬ 
vious sampled values (i.e., the context C = c). To draw the 
value of a specific RV, the methods compute the correspond¬ 
ing probability distribution P{Xi\C = c), sample the value 
of the variable, add the sampled value of Xi to C, repeating 
the same process up to the last variable 

Considering the relevance of BNs and their sampling 
process, BNs can also be utilized to model the formation 
and structure of relational networks—i.e., social, informa¬ 
tion, biological networks, where nodes correspond to enti¬ 
ties and links represent relations among the entities, (such 
as friendship links in Facebook). In this paper, we show 
that probabilistic generative network models (GNMs|^ can 
be reduced to BNs, and BN sampling methods can be ap¬ 
plied to generate networks. Some well known GNMs are: 
Erdos-Renyi (jErdds and Renyi 1959 1, Chung Lu (Chung 
[and Lu 2002)1, and the Kronecker product graph model 
(KPGM) ( jLeskovec et al. 20T0l l. 

GNMs model the distribution of networks G = (V, E) 
with set of nodes V and edges E, through binary random 
variables (typically one per each possible edge in the net¬ 
work). Particularly, the random variable Eij models the ex¬ 
istence of an edge between nodes Vi G V and Vj G V, 
where P{Eij) = Ttij. This results in a total of | Vp RVs. The 
naive sampling process of a network from a GNM samples 
each possible edge independently using a Bernoulli distri¬ 
bution. When the sample is a success (i.e., E^ = 1), then 
the edge is added to the set of edges E. Unfortunately, a 
naive sampling process has complexity time (3(|Vp) which 
make it impractical to model large networks. While there 
are some sampling algorithms with time complexity propor¬ 
tional to the number of edges (0(|E|)), most of these al¬ 
gorithms are provably incorrect (i.e. they generate improba¬ 


ble networks from the underlying distribution (Moreno et al. 
|20T4l l). 

Furthermore, some GNMs generate networks with prop¬ 
erties that differ from those observed in real-world networks 


*GNMs should not be confused with probabilistic graphical 
models, such as Bayesian networks. To avoid confusion we will re¬ 
fer to probabilistic graphical models as “graphs”, and to networks 
sampled from GNM as “networks”, except for Bayesian networks 
which are widely known as such. 



















Figure 1: Left; Matrix of Probabilities (grayscale depict¬ 
ing probability values from 0 (white) to 1 (black)). Center; 
Sampled adjacency matrix {Eij = 0 (white) and Eij = 1 
(black)). Right; Sampled network. 


(e.g., transitivity, assortativity). Generating realistic random 
networks is important for prediction, hypothesis testing, 
generation of data for evaluation, randomization of sensitive 
data, etc. This is the motivation behind several new GNMs 
with more complex dependencies between the edge RVs 
(e.g., mKPGM ([Moreno et al. 2010| l and BTER ( |Se5iadhn^ 
|Kolda, and Pinar 2012[ )). * 

For simple GNMs with independent binary RVs Eij trans¬ 
formation to a BN representation is not necessary. However, 
for some of the more recent GNMs with complex struc¬ 
ture due to latent variables and dependencies of the edges, 
a BN representation can be useful to consider for sampling 
and inference. Specifically, we can take advantage of exist¬ 
ing concepts and algorithms from research on BNs, partic¬ 
ularly from inference and learning. For example, we could 
(1) compactly represent the edge dependencies in the net¬ 
work, and (2) develop more efficient sampling mechanisms 
based on the conditional independence/dependence relation¬ 
ships encoded in the graphical model structure. 

In this paper, we consider mixed Kronecker Product 
Graph Models (mKPGMs) ( [Moreno et al. 2010| ). We show 
how an mKPGM can be represented as a Bayesian network 
with a hierarchy of latent variables that represent activations 
of clusters of edges at different levels in the network. Then, 
we consider the use of context specific independence (CSI) 
to facilitate the inference process and posterior sampling; 
however, it cannot be used to significantly reduce the time 
complexity of the sampling process. Then, we formalize the 
notion of context-specific dependence (CSD) and determin¬ 
istic context-specific dependence (DCSD) for hierarchical 
GNMs. Specifically, CSD is simply CSFs complementary 
concept and DCSD is an extreme form (i.e., deterministic 
CSD). We discuss how to improve the sampling process of a 
GNM by exploiting the DCSD property and iteratively sam¬ 
pling a hierarchy of latent variables that represent cluster 
activations at different levels. 


Background and Related Work 

Our work is related to CSI in probabilistic relational models 
where the RVs are predefined. However, in our analysis we 
encounter a varying number of RVs and conhgurations as 
opposed to the case of probabilistic relational models. The 
most representative work in CSI for probabilistic relational 
models is that of ( Fierens 2010|l . Also close to our analysis is 
the work of ( jNyrnan et al. 201^ and ( Pensar et al. 2015| l that 
deal with directed acyclic graphs and decomposable strati- 


hed graphical models, respectively. Both works allow to re¬ 
duce the size of the CPD to calculate the joint distribution. 
Our work does not require to calculate the joint but rather 
samples networks using randomization (that can be achieved 
through group probability sampling). 


Bayesian Networks 

A Bayesian network BN is a directed acyclic graph where 
the nodes represent RVs and the edges represent (directed) 
dependencies between variables. More precisely, a node in 
a BN is an RV that is conditionally dependent on its par¬ 
ents. Thus, each node in the BN has a conditional proba¬ 
bility associated explicitly, by design. Let Xi,X 2 ,..., Xn 
be a topological ordering of the nodes in the BN. Then, 
is independent of {Xi... Xi-i\pa{Xi))\pa{Xi). In conse¬ 
quence, the BN implicitly represents conditional indepen¬ 
dence relations. This simplifies the computation of the joint 
distribution of the RVs which can simply be stated as; 

n 

P(Xi,X2,...,X„) = []P(X,|pa(X0) 

i=l 


Bayesian Network Independence Properties 


The two main properties of BNs that are exploited for infer¬ 
ence are; conditional independence (Cl) and context-specihc 
independence (CSI) ( jBoutilier et al. 199^ . We describe Cl 
and CSI (later we derive related properties CSD and DCSD), 
without describing the details of how particular inference 
algorithms use these properties for inference, to simplify 
the exposition. Cl appears as the main characteristic in the 
structure of BNs whereby the joint distribution can be repre¬ 
sented by focusing in the conditional dependencies of RVs. 
The idea behind it is that the joint distribution can be com¬ 
puted more efficiently by considering the conditional inde¬ 
pendence relations of RVs which do not impact the compu¬ 
tation and use only the relevant nodes than considering all 
the nodes. This leads to a more efficient estimation of the 
conditional probability distributions of the RVs. The poste¬ 
rior distribution of some RVs can be computed in a tractable 
manner when other variables are observed, because only cer¬ 
tain variables have impact in the distribution of a node in the 
BN (the node’s parents, its children, and its children’s other 
parents). These variables (affecting the distribution of the 
node) comprise the node’s Markov blanket. 

CSI is another important inference property in BNs, and 
less restrictive than Cl. The idea behind it is that certain in¬ 
dependence relations may happen under certain realizations 
of RVs, i.e. only when certain RV values are observed. In 
such scenarios, even if Cl is not present the context of the 
RVs would allow to perform inference. This less restrictive 
context arises more frequently than Cl, particularly in rela¬ 
tional models ( Fierens 2010|l. Below, w e adapted the dehni- 
tion of CSI from ( jBoutilier et al. 1996| ) and ( [Fierens 2010| ). 


Definition 1. Context-specific independence; Let X, Y 

and W be distinct sets of RVs. Then X Xc Y | W = w 
(which reads as follows; X is context-specific independent 
of Y given W = w) if P(X|Y,W = w) = P(X|W = 
w) whenever P(Y, W = w) > 0. 
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Figure 2; KPGM (a) and mKPGM (b) for FC = 3 and £ = 2. 


While Cl and CSI are properties consistently used for in¬ 
ference in the BN research community, our task is not to 
infer unobserved RVs. Instead we would like to take advan¬ 
tage of inference mechanisms for realization of RVs. i.e. for 
sampling. 


Generative Network Models 


The goal of GNMs is to generate random networks G from 
certain network-distribution P{G). One of the most popular 
mechanisms used to generate G is to produce a matrix of 
edge-probabilities V from which sampling of a network’s 
adjacency matrix is done. Figure shows a matrix of edge- 
probabilities V (left) from which a random adjacency matrix 
is sampled (center), with its corresponding sampled network 
(right). For example, 7^[7, 8 ] = P{Ers) = has a high 
probability (dark cell, left plot), and the edge e^s is sampled 
(black cell, center plot). Next, we describe two GNMs that 
are complex enough to incorporate several levels of RVs. 


Block two-level Erdos-Renyi (BTER) model: Block two- 


level Erdos-Renyi (BTER) model ( [Seshadhri, Kolda, and 
|Pinar 201^ is a GNM where networks are sampled in three 
steps. First, a preprocessing step groups nodes of (almost) 
the same degree in blocks. Second, the so called phase-1 
of the algorithm creates conventional Erdos-Renyi graphs 
for each block, i.e. each edge is created independently with 
equal probability in the block. The number of edges sampled 
depends on a parameter provided to the algorithm and on the 
lowest degree node in the block. Last, the blocks are linked 
using a Chung-Lu model ( [Chung and Lu 2002l l, which is a 
type of weighted Erdos-Renyi model. 


Sampling from GMNs 

mKPGM sampling: Given the parameter-matrix 0 
dim(0) = 6 X 6 (V i,j 6ij € [0,1]), the number of Kro- 
necker multiplications K, and the number of untied levels i, 
mKPGM generates a network as follows: Eirst, it computes 
7^^ by ^ — 1 Kronecker product of 0 with itself. Second, 
it samples a network G^ — (V^, E^) from by sampling 
each cell independently from a Bernoulli{Vfj). Third, the 
algorithm calculates ^ 0 and samples 

G^+^ for A = 1...7T-f as before. This iterative process, 
of Kronecker multiplications and sampling, ties parameters 
and increases the variability over the generated network of 
the model. A references a tying iteration in the mKPGM 
sampling process. We will refer to each cell sampled with 
mKPGM as an RV with Bernoulli distribution. Notice that 
this RVs represent edges in the last tying iteration of the 
mKPGM sampling process and sets of edges (clusters) at 
higher levels of the mKPGM tying iterations. 

Eigure shows an example of KPGM and mKPGM with 

parameters K = 3, £ = 2, b = 2, and 0 = ^ ^ 

KPGM generates the probability matrix V (left column k = 
3) before sampling the final network (right column k = 3). 
Instead, mKPGM sample G^ at k = 2 = £. Then, it generates 
= G‘^ ® Q and samples G^ for A = 1. 


Group Sampling: Group Probability sampling (GP) is a 
general sampling method that can be applied to many types 
of GNMs. It is an alternative to the normal sampling ap¬ 
proach of most GNM where edges are sampled one-by-one. 
Instead, GP allows to sample groups of edges all sharing 
the same probability of being sampled. GP is an unbiased, 
provably correct, and efficient sampling process that can 
be applied to any GNMs that define a matrix V of edge- 
probabilities. Given a GNM with parameter 0 that defines 
V, GP samples a network in three steps. Eirst, it derives U 
a set of unique probabilities (tt^) in V as determined by the 
GNM. Second, for each S U it calculates Tk, the number 
of possible edges associated with tt^, and samples the num¬ 
ber of edges Xk, to be placed among Tk possible ones with 
P{Xk = Xk) Bin{n,p) ^ n = Tk,p = iTk (because the 
number of successes in Tk Bernoulli trials with probability 
TTfc are binomial-distributed). Third, it samples Xk edges at 
random among the Tk possible edges with probability tt^. 
This process can be applied to each tied iteration A of the 
mKPGM model. Eor further details of the GP sampling for 
mKPGM, please refer to ( Moreno et al. 201^ . 


mixed Kronecker Product Graph Model (mKPGM): 

mKPGM is a generalization of the Kronecker Product Graph 
Model (KPGM) (Leskovec et al. 20101. KPGM generates a 
matrix of edge-probabilities 7^ by 7T — 1 Kronecker product 
of a matrix of parameters 0, of size b x b, with itself The 
value of K is such that will lead to the desired target number 
of nodes, given that dim{Q) = b x b then b^ = |V|. Once 
V is calculated, the final network is sampled. On the other 
hand, mKPGM uses parameter tying to capture the charac¬ 
teristics of a network population ( [Moreno et al. 2010| l as will 
be described in the next paragraph. 


Generative Network Models 
Represented as Bayesian Networks 

Bayesian networks can be used to represent the relationships 
between RVs in GNMs. As we mentioned in the introduc¬ 
tion, for some GNMs, since the edge RVs are independent, 
it is unnecessary to consider a BN representation. Eor exam¬ 
ple, the model in Eigure [T corresponds to an 8 -node undi¬ 
rected network with no-self loops, thus there are 28 inde¬ 
pendent edge RVs. Flowever, a BN representation is more 
appropriate for new models with more complex dependen- 

























Figure 3: Left; RVs of an mKPGM sampling process. Right; 
plate notation BN equivalence of the same mKPGM RVs. 

cies among the edges (such as mKPGM and BTER), and 
inference or sampling can be done based on the associated 
graphical models. 

Remark. An mKPGM model M with parameters 0, K, and 
^ can be represented as a BN N with a tree structure and 
parameters 0 ' obtained from 0 ; Afer^A/e'. 

The mKPGM model consists of multiple levels of RVs. 
The first of these levels corresponds to the Bernoulli param¬ 
eters in V^, the probability matrix to generate the subnet¬ 
work = (V^, E^). Each possible S is generated 
with probability P{E^j) = With ab x b parameter 

matrix 0 , there are ( 6 ^)^ = = \V^\ possible edges. 

These potential edges, at the top of the hierarchy, can be 
modeled as independent RVs in a BN (i.e., the root nodes 
of the BN). Let be the RV in the BN representing the 
edge Ely Then the BN representation of this level of the 
hierarchy corresponds to independent RVs, with; 

P{zf^=l)=V%j] 

P{zf} = 0 ) = 1 - V%3] 

More generally, we will use the notation zjj^ to refer to 
RVs in the BN representation, where A = [0,K — £] refers 
to the level of tying in the mKPGM. The first level (A = 0) 
refers to the untied portion of the mKPGM. For notational 
purposes, we will use to refer to the set of all RVs zf^^ 

The next level corresponds to the Kronecker product of 
with 0, which produces 7^^+^ = 0 0. There are 

(^(^+i ))2 _ possible edges in the next level of the 

hierarchy, with each edge impacting b^ of the edges in 
E^+^ due to the Kronecker product (i.e., is generated 
from = EyOxy for some i, j S [ 1 , b^] and x,y £ [ 1 , 6] 
s.t. e^y G 0). 

The BN representation of this level of the hierarchy con¬ 
sists of a random variable for each edge E^^^, for a 

total of RVs. The Kronecker product relationships 

are modeled by dependencies in the BN, so each G 
has descendants in Z^^l. Thus the RVs in Z^^l can be 


thought of as sets of RVs, each of size 6 ^, which share 
a common parent in Z^^l. For an edge that is generated 
via EfjOxy, the conditional probability for its associated RV 
is; 

E(eW = l\zlf = 1 ) = 9,y 

E(eW = 0\zlf =l) = l-9,y 

e(eW = i|4°i = 0) = 0 
e(eW = o\zlf = 0 ) = 0 

The remaining levels of the mKPGM can be transformed 
by the same process. In general, a level A of the mKPGM 
hierarchy is represented by a set of RVs in Z^^l, 

where is the number of nodes in the graph Each 

z]ll^ G Zl"^! has one parent in and each 

has b^ descendants in Z^^l. 

This process generates a tree structure where groups of b^ 
RVs have the same parent in Z[^“^l. Two variables zlj^ and 

zlj^ at levels A and (j> are dependent if they share a common 
ancestor. 

The hnal BN Af consists of all the RVs 
ZM, Zl^l,..., and their associated probabili¬ 

ties. This shows that the BN Af represents the model A4, 
i.e. AArAf. 

An example BN representation of an mKPGMs is visual¬ 
ized in Figurefor A = 0,1,2,3. Here A = 0 corresponds 
to G^ in the mKPGM sampling process. There is a total of 
(6^)2 = RVs each of them represented by a zf^ . Note 
the use of double subindex for the Z RVs is to indicate the 
position of the RV in the cluster/edge matrix. Each of these 
RVs has b^ descendants at A = 1. However, to make it easier 
visualize the relations among the variables in the left sub¬ 
plot, we drop the descendants for all RVs except one in each 
level of the hierarchy. In the right subplot, the descendants 
are represented more generally by the plate notation. 

We note that the tree structure of the GNM-associated BN, 
along with the recursive nature of the GNM and the symme¬ 
tries among RVs with the same probability, would make it 
amenable for lifted-inference. However, for this paper our 
discussion is centered in the problem of sampling. 

Sampling from Bayesian Networks 

Given that an mKPGM can be reduced to a BN, we now 
consider sampling from the associated BN to generate a net¬ 
work from the underlying mKPGM model. The process to 
sample from an empty BN is straightforward. It involves de¬ 
termining a topological sorting of the RVs, then iteratively 
sampling a value for each RV conditioned on the sampled 
values of its parents. We will discuss how the structure of 
the associated BN can be exploited to speed up this process 
below. However, we note that the complexity increases if the 
sampling is conditioned on evidence and the BN representa¬ 
tion will facilitate even further gains for these more complex 
inference tasks. 
















Naive Sampling Using Conditional Independence 

Given that the BN for mKPGMs is tree-structured, it is easy 
to determine a topological sort that will facilitate sampling. 
Specihcally, each tree rooted at an RV in is indepen¬ 
dent of the others. Moreover, within a particular tree, at 
level A, each is conditionally independent of the others 

(Zl'*'] — {zlj^}) given the value of its parent in Thus 

it is simple to use the hierarchy itself as the topological or¬ 
dering for sampling. Given that the RVs at level A of the hi¬ 
erarchy are conditionally independent once all the RVs from 
A — 1 are sampled, the order in which the RVs are sampled 
within the same level is not important. Furthermore, since 
each CPT corresponds to a 2 x 2 matrix (where if the parent 
value is zero the RV has zero probability of being sampled, 
otherwise it has a probability equal to some B^y G 0), sam¬ 
pling of each RV value is constant. Thus, the complexity of 
sampling will be a function of the number of RVs in the BN. 
Unfortunately the number of RVs increase at each level of 

the hierarchy. The number of RVs at hierarchy A is equal to 

(^r+A)2 

SO this results in a total number of RVs: 


K-t 




K-£ 


A=0 


(62)if+i _ 
62-1 


which is significantly larger than the number of possible 
edges in the network: N'^ = . 


Context-Specific Independence for Network 
Sampling 

Context-specific independence (CSI) could be used to im¬ 
prove sampling efficiency by either reducing the size of the 
CPTs or simplifying the ordering of RVs (e.g., facilitating 
parallelization). 

To exploit CSI, we first need to identify the context in the 
mKPGMs for which independence between random vari¬ 
ables arises. Recall that for three RVs X, Y, Z, the definition 
of CSI is X ±^Y \W = w if P{X\Y,W = w) = P{X\W = 
w). Since each RV in the mKPGM BN has a single parent, 
with the topological ordering discussed above there is not 
any opportunity to use CSI to improve the efficiency of the 
sampling process. However, CSI could be useful for more 
complicated inference tasks that condition on evidence. 


Context-Specific Dependence 
for Network Sampling 

We now formalize the concept of context-specific depen¬ 
dence (CSD). Note that in the definition, W can be any set 
of RVs in a BN and is not necessarily related to the RVs for 
mKPGM. 

Definition 2. Context-specific dependence: Let X, Y and 

W be distinct sets of RVs. Then X _jL^ Y | W = w if 
P(X|Y, W = w) 7 ^ P(X|W = w) whenever P(Y, W = 
w) > 0. 

Both CSI and CSD may appear in GNMs graphical mod¬ 
els. Whenever independence of RVs in a BN appear due to 
specific context, then CSI properties can be exploited to re¬ 
lax the constraints on inference and sampling. On the other 


hand, the BN representation itself generally implies CSD— 
since it is assumed that an RV depends on the value of 
its parents. However, if the CSD produces more structure 
(e.g., additional symmetry, more extreme dependence) then 
its properties can be exploited to tighten the constraints on 
inference and sampling. 

In GNMs, the BN structure has a more specific depen¬ 
dency that can be used for efficient sampling: 

Definition 3. Deterministic CSD (DCSD) in mKPGMs: 

Let Xi be an mKPGM with associated BN Af. Let 

be the probability in Af that the RV zlj^ = 1. JC is de¬ 
terministic context-specific dependent if at each layer A, it 
partitions all RVs zf^\ such that: 


p (zlf = 1 


where P (zf^^ = 1 


pa(zW)=0) =0 V*,j, A 
pa(zg') = l) >0Vz,j,A. 


Combining the hierarchical order sampling process dis¬ 
cussed previously and DCSD, we can reduce the complexity 
of sampling a network. Specifically, once the |V^p RVs are 
sampled from the first hierarchy level (A = 0), instead of 
sampling all variables of the second level (Z^^i), we avoid 
considering the RVs with parent values of zero. This re¬ 
sults in a considerable reduction in the number of sampled 
RVs, which is propagated down the hierarchy. For example, 
if zf^ = 0, we avoid sampling (6^)^“^ RVs (i.e., 6^ de¬ 
scendants are recursively affected at each of the \ = K — £ 
levels). Let be the number of active RVs (i.e., value of 
1) at layer A. Then the number of variables to be sampled in 
the next level is equal to (each variable has 6^ de¬ 

scendants). As demonstrated in previous work on mKPGMs, 
the expected number of edges at layer A is 0)^^^ 

(|Moreno et al. 201011. Thus, in expectation, the total number 

K-e 

of RVs sampled using DCSD is Also, since the 

A=0 

RVs we only analyze random variables with active parent, 
the CPT look up can be reduced to a single value. These 
simplifications produce a considerable reduction in the time 
complexity of the network sampling process. 

It is important to note that exploiting DCSD for mKPGM 
sampling will generate networks from the true network dis¬ 
tribution as long as GP sampling is applied to randomly sam¬ 
ple from RVs with the same probability at each tied iteration. 
This is because GP sampling generates networks from the 
true network distribution ([Moreno et al. 201^. 


Complexity Analysis Comparison 

As stated before, the sampling process is the same for all 
BN regardless of the method used: Cl or DCSD. This pro¬ 
cess involves determining a topological sorting of the RVs, 
then iteratively sampling a value for each RV conditioned on 
the sampled values of its parents. Consequently, the differ¬ 
ence in performance between the different methods depends 
on two factors: the number of RVs to be sampled, and the 
complexity of the CPT look up to sample from the RVs. 










Property 

Number of RVs 

pa values 

Cl 

( 52 )A+i _ 

62 - 1 

2 

DCSD 

K-e 

Y.N’-' 

A=0 

1 

ebound DCSD 

(K-i+l)b^+'^ 

1 


Table 1; Complexity for GMNs sampling methods that ex¬ 
ploit different properties of the associated BN. 

Table [T] shows a comparison of the number of sampled 
RVs and the number of parent combinations in the CPTs 
for the sampling methods discussed in the paper. Recall that 
b corresponds to the size of the original parameter matrix 
{dim{Q) = b X b), K dehnes the number of Kronecker 
products, i is the number of independent hierarchy levels 
formKPGM, and thus A S {0, .. . ,K — i}. 

DCSD allows more efficient sampling than Cl because 

the number of RVs is smaller than Cl; ' b'^-i —~ > 

J2\=o This is easy to verify. Assuming each entry of 
0 with size 6 x h is a valid probability and hence 0^ < 1, 
then 62 > ^ 0. Then, (E 6)'^^ 

It is worth noticing the relation of the number of possible 
edges = b^ and the number of RVs in Cl and DCSD. 
is equal to the last term of J2x=oi^^Y~^^- On the other 

hand, the last term of J2\=o (E 

Finally, most real networks are sparse, which means 
|E| = 0{Ny) = b^ . However, the number of RVs us¬ 
ing Cl is larger than iV^. In expectation, each level of the 
mKPGM hierarchy will sample 0(6^+^) edges. The total 

number of sampled RVs is bounded by J2x=o ^z'^ ' < 

^if +2 1 < (X _ ^ _l_ l) 6 t<'-i -2 bound in expecta¬ 

tion (ebound) is signihcantly less than Ny 

Discussion, Current and Future Work 

CSI and CSD are complementary properties arising in 
graphical models, in which the context changes the con¬ 
straints during inference—either by relaxing or tightening 
the constraints. By identifying and taking advantage of these 
properties, it is possible to perform more efficient inference 
and sampling. 

We showed an example of a GNM that can be reduced 
to a graphical model and that sampling could be done from 
multiple perspectives. While sampling efficiencies based on 
CSI are not available for this type of BN, exploiting DCSD 
allows us to develop a faster sampling process (compared to 
conventional Cl sampling). This improvement is primarily 
due to a reduction in the number of sampled RVs. Combined 
with group sampling, DCSD properties can be exploited for 
fast and provably correct sampling in other GNMs with com¬ 
plex dependencies, as in mKPGM. However, in mKPGMs 
the DCSD properties may also complicate inference tasks 
that condition on evidence—because the nature of DCSD 
constrains the problem and reduces the number of possible 
solutions. The implications of this are the subject of our on¬ 
going work. 


Acknowledgments 

This research is supported by NSF and DARPA under con¬ 
tract numbers IIS-1149789, CCF-0939370, and N660001- 
1-2-4014. The U.S. Government is authorized to reproduce 
and distribute reprints for governmental purposes notwith¬ 
standing any copyright notation hereon. 

References 

[Boutilier et al. 1996] Boutilier, C.; NirFriedman; Gold- 
szmidt, M.; and Koller, D. 1996. Context-specihc indepen¬ 
dence in bayesian networks. In Proceedings of the Twelfth 
International Conference on Uncertainty in Artificial Intel¬ 
ligence, UAV 96, 115-123. 

[Chung and Lu 2002] Chung, F, and Lu, L. 2002. The aver¬ 
age distances in random graphs with given expected degrees. 
PNAS 99(25); 15879-15882. 

[Cooper 1990] Cooper, G. F. 1990. The computational com¬ 
plexity of probabilistic inference using bayesian belief net¬ 
works (research note). Artif. Intel! 42(2-3);393-405. 

[Erdos and Renyi 1959] Erdos, P, and Renyi, A. 1959. On 
random graphs, i. Publicationes Mathematicae (Debrecen) 
6;290-297. 

[Eierens 2010] Eierens, D. 2010. Context-specihc indepen¬ 
dence in directed relational probabilistic models and its in- 
Huence on the efficiency of gibbs sampling. In Proceedings 
of the 2010 Conference on ECAI2010: 19th European Con¬ 
ference on Artificial Intelligence, 243-248. 

[Leskovec et al. 2010] Leskovec, J.; Chakrabarti, D.; Klein- 
berg, J.; Ealoutsos, C.; and Ghahramani, Z. 2010. Kro¬ 
necker graphs; An approach to modeling networks. JMLR 
ll(Feb);985-1042. 

[Moreno et al. 2010] Moreno, S.; Kirshner, S.; Neville, J.; 
and Vishwanathan, S. 2010. Tied kronecker product graph 
models to capture variance in network populations. In Com¬ 
munication, Control, and Computing (Allerton), 2010 48th 
Annual Allerton Conference on, 1137-1144. 

[Moreno et al. 2014] Moreno, S.; Pfeiffer III, J.; Kirshner, S.; 
and Neville, J. 2014. A scalable method for exact sampling 
from kronecker family models. In IEEE 14th International 
Conference on Data Mining (ICDM). 

[Nyman et al. 2014] Nyman, H.; Pensar, J.; Koski, T; and 
Corander, J. 2014. Stratihed graphical models - context- 
specihc independence in graphical models. Bayesian Ana! 
9(4);883-908. 

[Pearl 1988] Pearl, J., ed. 1988. Probabilistic Reasoning In 
Intelligent Systems: Networks of Plausible Inference. Mor¬ 
gan Kaufmann. 

[Pensar et al. 2015] Pensar, J.; Nyman, H.; Koski, T.; and 
Corander, J. 2015. Labeled directed acyclic graphs; a 
generalization of context-specihc independence in directed 
graphical models. Data Mining and Knowledge Discovery 
29(2);503-533. 

[Seshadhri, Kolda, and Pinar 2012] Seshadhri, C.; Kolda, 
T. G.; and Pinar, A. 2012. Community structure and 
scale-free collections of Erdos-Renyi graphs. Physical 
Review E 85(5). 










